444 research outputs found
Deformable Model-Driven Neural Rendering for High-Fidelity 3D Reconstruction of Human Heads Under Low-View Settings
Reconstructing 3D human heads in low-view settings presents technical
challenges, mainly due to the pronounced risk of overfitting with limited views
and high-frequency signals. To address this, we propose geometry decomposition
and adopt a two-stage, coarse-to-fine training strategy, allowing for
progressively capturing high-frequency geometric details. We represent 3D human
heads using the zero level-set of a combined signed distance field, comprising
a smooth template, a non-rigid deformation, and a high-frequency displacement
field. The template captures features that are independent of both identity and
expression and is co-trained with the deformation network across multiple
individuals with sparse and randomly selected views. The displacement field,
capturing individual-specific details, undergoes separate training for each
person. Our network training does not require 3D supervision or object masks.
Experimental results demonstrate the effectiveness and robustness of our
geometry decomposition and two-stage training strategy. Our method outperforms
existing neural rendering approaches in terms of reconstruction accuracy and
novel view synthesis under low-view settings. Moreover, the pre-trained
template serves a good initialization for our model when encountering unseen
individuals.Comment: Accepted by ICCV2023. Visit our project page at
https://github.com/xubaixinxbx/3dhead
UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human Generation
Human generation has achieved significant progress. Nonetheless, existing
methods still struggle to synthesize specific regions such as faces and hands.
We argue that the main reason is rooted in the training data. A holistic human
dataset inevitably has insufficient and low-resolution information on local
parts. Therefore, we propose to use multi-source datasets with various
resolution images to jointly learn a high-resolution human generative model.
However, multi-source data inherently a) contains different parts that do not
spatially align into a coherent human, and b) comes with different scales. To
tackle these challenges, we propose an end-to-end framework, UnitedHuman, that
empowers continuous GAN with the ability to effectively utilize multi-source
data for high-resolution human generation. Specifically, 1) we design a
Multi-Source Spatial Transformer that spatially aligns multi-source images to
full-body space with a human parametric model. 2) Next, a continuous GAN is
proposed with global-structural guidance and CutMix consistency. Patches from
different datasets are then sampled and transformed to supervise the training
of this scale-invariant generative model. Extensive experiments demonstrate
that our model jointly learned from multi-source data achieves superior quality
than those learned from a holistic dataset.Comment: Accepted by ICCV2023. Project page: https://unitedhuman.github.io/
Github: https://github.com/UnitedHuman/UnitedHuma
Urban Radiance Field Representation with Deformable Neural Mesh Primitives
Neural Radiance Fields (NeRFs) have achieved great success in the past few
years. However, most current methods still require intensive resources due to
ray marching-based rendering. To construct urban-level radiance fields
efficiently, we design Deformable Neural Mesh Primitive~(DNMP), and propose to
parameterize the entire scene with such primitives. The DNMP is a flexible and
compact neural variant of classic mesh representation, which enjoys both the
efficiency of rasterization-based rendering and the powerful neural
representation capability for photo-realistic image synthesis. Specifically, a
DNMP consists of a set of connected deformable mesh vertices with paired vertex
features to parameterize the geometry and radiance information of a local area.
To constrain the degree of freedom for optimization and lower the storage
budgets, we enforce the shape of each primitive to be decoded from a relatively
low-dimensional latent space. The rendering colors are decoded from the vertex
features (interpolated with rasterization) by a view-dependent MLP. The DNMP
provides a new paradigm for urban-level scene representation with appealing
properties: High-quality rendering. Our method achieves leading
performance for novel view synthesis in urban scenarios. Low
computational costs. Our representation enables fast rendering (2.07ms/1k
pixels) and low peak memory usage (110MB/1k pixels). We also present a
lightweight version that can run 33 faster than vanilla NeRFs, and
comparable to the highly-optimized Instant-NGP (0.61 vs 0.71ms/1k pixels).
Project page: \href{https://dnmp.github.io/}{https://dnmp.github.io/}.Comment: Accepted to ICCV202
- …